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We present a plane-wave ultrasoft pseudopotential implementation of first-principle molecular dy- 
namics, which is well suited to model large molecular systems containing transition metal centers. 
We describe an efficient strategy for parallelization that includes special features to deal with the 
augmented charge in the contest of Vanderbilt's ultrasoft pseudopotentials. We also discuss a simple 
approach to model molecular systems with a net charge and/or large dipole/quadrupole moments. 
We present test applications to manganese and iron porphyrins representative of a large class of bio- 
logically relevant metallorganic systems. Our results show that accurate Density-Functional Theory 
calculations on systems with several hundred atoms are feasible with access to moderate computa- 
tional resources. 
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I. INTRODUCTION 

There is increasing interest in studying the electronic structure of complex biological molecules. This is 
an essential step to understand e.g. enzymatic and/or biomimetic catalysis. Modeling bio-catalytic systems 
is however very challenging, because a proper description of the active site needs the inclusion of a large 
number of atoms (from several tens to a few hundreds) treated at a high level of quantum chemical theory. 
In this respect a good compromise in terms of accuracy and computational cost is provided by Density- 
Functional Theory (DFT) 1 , whose use to model the electronic structure of protein active sites is becoming 
increasingly popular. In combination with Car-Parrinello (CP) 2 first-principle molecular dynamics (MD), 
DFT allows us to optimize molecular structures, study dynamical and finite-temperature properties, and 
model reaction paths. 

In most standard implementations, the CP method employs a plane-wave (PW) basis set. An advantage 
of PWs is that they do not depend on atomic positions and are free of basis-set superposition errors. Total 
energies and forces on the atoms can be calculated using computationally efficient Fast Fourier transform 
(FFT) techniques. Finally, the convergence of a calculation can be controlled in a simple way, since it 
depends only upon the number of PWs included in the expansion of the electron density. The dimension of 
a PW basis set is controlled by a cutoff in the kinetic energy of the PWs, which is usually measured in Ry 
units. A disadvantage of PWs is their extremely slow convergence in describing core states. To deal with 
this difficulty, one usually employs norm-conserving (NC) pseudopotentials (PPs) 3 to model the interaction 
of the valence electrons with the ionic core (nucleus + core electrons). Parallel implementations of PW 
calculations based on NC PPs are well documented in the literature (see e.g4*^). 

When using NC PPs, very large PW basis sets are needed to accurately represent the contracted p orbitals 
of the first-row elements O, N, F, and the 3d orbitals of the transition metal block. These orbitals belong 
to elements for which the NC PPs are "hard", typically requiring cutoffs of more than 70 Ry in order to 
yield sufficiently converged results. As a comparison, calculations on elements like Al, Si, P, for which the 
corresponding NC PPs are "soft", are usually well converged with a cutoff of 20 Ry or less. A consequence 
of the delocalized nature of the PWs is that the presence of a single hard PP in a system requires the use of 
a correspondingly high cutoff for all the other PPs. This difficulty is particularly serious for metallorganic 
systems containing one or more transition metal centers. The high cutoff required for such atoms translates 
into a very large number of PWs, which in turn implies long execution times and large memory requirements. 

An approach that drastically reduces the PW cutoff was proposed by Vanderbilt 7 , who introduced "ul- 
trasoft" (US) PPs. In this approach the normalized charge density is the sum of two terms, a soft part 



3 



represented in terms of smooth orbitals, and a hard part which is treated as an augmented charge. A closely 
related approach, the "Projector-augmented wave" (PAW) method introduced by Blochl 8 , is an all-electron 
rather than a PP electronic structure method. The PAW approach provides a simple and effective algorithm 
for reconstructing all-electron orbitals from pseudo-orbitals 9 . An efficient serial implementation of the CP 
scheme with US PPs is described in Ref. 10 . 

In this paper we present in detail a parallel implementation of the CP scheme using US PPs. We also 
provide further details on the procedures used in serial implementation. We compare the relative effi- 
ciency of US and NC PPs in realistic calculations for large molecules, performed on parallel machines. 
Our test molecules are a reduced and an extended model of the active site of myoglobin, containing the iron- 
poiphyrin motif. We focus on metalloporphyrin systems because they are representative of a large class of 
biomolecules which can be modeled efficiently with US PPs. Our benchmarks indicate that US calculations 
are at least 2-3 times less expensive than NC calculations of comparable accuracy. 

The use of a PW basis set implies that Periodic Boundary Conditions (PBC) are imposed, i.e. an iso- 
lated molecule has to be placed into a periodically repeated box (a "supercell"). The supercell must be 
large enough to ensure that the total potential is vanishingly small at the box boundary, thus minimizing 
spurious interactions between periodic replicas. For neutral systems with small dipole/quadrupole mo- 
ments, supercells of reasonable size can be safely used. For charged molecules, or molecules with large 
dipole/quadrupole moments, however, the error induced by PBC may be rather large unless exceedingly 
large supercells are used. 

We follow here a technique introduced by Makov and Payne (MP) 11 to eliminate the spurious electro- 
static interactions in the latter case. The MP technique is approximate because it is not self-consistent and 
takes into account only moments up to quadrupole. To check the accuracy of the MP technique, we compare 
CP calculations on highly charged manganese porphyrins, performed using PWs and US PPs, with calcu- 
lations on the same systems using localized basis sets which do not require PBC. The comparison shows 
that the MP correction yields results that to all practical effects are indistinguishable from results obtained 
without PBC. 

The paper is organized as follows. In Sec.2, we recall the main aspects of US-PP implementation in 
the serial case. In Sec. 3, we describe our parallel implementation. In Sec.4, we compare the computer 
performances of US and NC PPs for a reduced and an extended model of the myoglobin active site. In Sec. 5, 
we compare CP calculations with localized basis-set calculations not requiring PBC. The test systems are 
highly-charged isomeric meso-substituted manganese porphyrins. Sec. 6 contains our conclusions. 
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II. PLANE- WAVE ULTRASOFT PSEUDOPOTENTIAL IMPLEMENTATION 
A. Kohn-Sham equations with Ultrasoft Pseudopotentials 

The implementation of CP molecular dynamics within a US-PP framework is described in Ref. 10 . Here 
we briefly remind the main formulas, using the same notation of Ref. 10 . 

The total energy of a system of N v valence electrons, having one-electron Kohn-Sham (KS) orbitals fa, 
is given by 

h 2 

E tot [{fa}, {Rj}] = V(&| - —V 2 + V Nh \fa) + Ea[n] + E xc [n] 
L — 1 2m 

+ JdrV™(r)n(r) + U({R I }) , (1) 

where n(r) is the electron density, Eft{n} is the Hartree energy: 

e 2 f f , n(rWr') 

EH[n] = ^jj drdr i^r ' (2) 

E xc [n] is the exchange and correlation energy, U({Ri}) is the ion-ion interaction energy, and R/ indicate 
atomic positions. In the following, potentials have energy dimensions. The PP is composed of a local part 
Vfo° n , given by a sum of atom-centered radial potentials: 

^°c n (r)=E^oc(|r-R/|) (3) 
I 

and a nonlocal part Vnl: 



^nl= £ D^J^J , (4) 

nm,I 

where the functions (3^ and the coefficients Dnm characterize the PP and are specific for each atomic species. 
For simplicity, we consider a single atomic species only in what follows. The (3^ functions, centered at site 
R/, depend on the ionic positions through 

Piir) = p n (r - Rj) . (5) 

P n is an angular momentum eigenfunction in the angular variables, times a radial function which vanishes 
outside the core region; the indices n and m in Eq. © run over the total number 2Vg of these functions. 
The electron density in Eq. Q is given by 

n(r) = ]T [ |<Mr)| 2 + £ QLi'MlPtiiflnlb) } > (6) 

i nm,I 
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where the sum runs over occupied KS orbitals. The augmentation functions Q^ m (r) = Q nm (r — Hj) are lo- 
calized in the core. The ultrasoft PP is fully determined by the quantities V\ oc {r),Dnm, Qnm( r ), an d /9n( r )- 
The functions Q nm (r) are defined in terms of atomic orbitals as: Q nm (r) = ?/^ e * (r)ijj^ (r) — ipf^* (r)t/j^ (r) 
where ip ae are atomic one-electron orbitals (not necessarily bound), and i)j ps are the corresponding pseudo- 
orbitals. The Q n m(?) are pseudized as described in Ref. 10 . This enables us to treat the Qnm(r) with Fourier 
transform techniques. 

The KS orbitals obey generalized orthonormality conditions 

<&|S({Rj})|&> = <% , (7) 
where S is a Hermitian overlap operator given by 

S = l+Y / Qnm\P I n )(P I m \, (8) 
nm,I 

and 

<?nm = J drQ nm {r). (9) 

The orthonormality condition © is consistent with the conservation of the charge J drn(r) = N v . Note 
that the overlap operator S depends on ionic positions through the 

The ground-state orbitals <pi minimize the total energy Q subject to the constraints ©, 

JW) =t - SUrh <10) 

where the e, are Lagrange multipliers. Eq. (fTTl > is the KS equation: 

H\cl) i )=e i S\(t>i) (11) 

where 

= -^V 2 + Kff + E Dl nm\^n)(0L\ • (12) 
nm,/ 

V^ff is a screened effective local potential, 

F eff (r) = ^° n (r) + Vk(r) + fx xc (r) . (13) 
/i xc (r) is the exchange-correlation potential: 
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and Vn(r) is the Hartree potential: 

Vn(r) = e 2 Jdr'0L. (15) 
The "screened" coefficients appearing in Eq. (fT^T > are defined as: 

Km = D$L + Jdr y eff (r)Q^ m (r) . (16) 
They depend on the KS orbitals through V c s, Eq. (fT3l . and the charge density, Eq. 

B. Molecular dynamics with Ultrasoft Pseudopotentials 

In the CP approach, 2 the electronic orbitals and the ionic coordinates evolve according to a classical 
Lagrangian 

£ = ^E Jdr\Ur)\ 2 + ^Y, M ^ 2 l- E tot(Wi},{^l}) , (17) 
i I 

subject to a set of constraints 

A^({&},{R/}) = <&|S|&> - = . (18) 

Here /i is a fictitious mass parameter for the electronic degrees of freedom, Mj are the masses of the atoms, 
and Etot an d S are given in Eqs. Q and ©, respectively. The holonomic orthonormality constraints (TT8"t 
do not cause energy dissipation in an MD run. They may be incorporated in the Euler equations of motion 
by introducing Lagrange multipliers Ay : 

Fj = MjKj = -^g* +£A^ i |^ i ) . (20) 
7 u 1 

At equilibrium, Eq. dT9t reduces to the electronic KS equations (fTOl or (fTTb . A unitary rotation brings the A 
matrix into diagonal form: A^ = ej^,-. The equilibrium for the ions is achieved when the ionic forces Fj 
in Eq. (l2"0l vanish. 

In deriving explicit expressions for the forces, Eq. d20l . one should keep in mind that the electron density 
also depends on R/ through Qn m and f3^. Introducing the quantities 



J nra / < 



(&|/#>(/#J&) , (2D 
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and 



L = E A *iWn)(/4l0i) > (22) 



we arrive at the expression 

F/ - "m;-/ dr w n(r)_ y dryeff(r) ^^~ pnm 



where D^ m and V^ff have been defined in Eqs. (fT6l and (113b . respectively. The last term of Eq. d23l gives the 
constraint contribution to the forces. Since the PW basis set does not depend on atomic positions, Pulay-type 
corrections 12 do not appear in the expression for the forces. 

C. Discretization of the equation of motion and orthonormality constraints 

The equations of motion ( fl9t and d20t are usually discretized using the Verlet or the velocity-Verlet 
algorithms. The following discussion, including the treatment of the R/ -dependence of the orthonormality 
constraints, applies to the Verlet algorithm when using the Fourier acceleration scheme of Ref. 13 . In this 
framework the fictitious electron mass is represented by an operator 0, whose matrix elements between 
PWs are given by 

©G,G' = max ( fx, Vij—^ j 5g,G> , (24) 

where G, G' are the wave vector of PWs, E c is a cutoff (typically a few Ry) which defines the threshold 
for Fourier acceleration. The fictitious electron mass depends on G as the kinetic energy for large G, it is 
constant for small G. This scheme allows us to use larger time steps with negligible computational overhead. 
The electronic orbitals at time t + At are given by: 

Mt + At) = 2&(t) - Ut ~ At) - (At) 2 ®- 1 -^A (J (t + At) S(t)^(t)] , (25) 

where At is the time step, and S(t) indicates the operator S evaluated for ionic positions Ri(t). Similarly 
the ionic coordinates at time t + At are given by: 

R/(t + At) = 2R/(t) -R/(t- At) 

(At) 2 rdE tot ^ . . , u ,,dS(t) 



EA«(* + A*)(^(t)|^|0,-(O> • (26) 



Mi L dRi ^ Jy M v " dRi 
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The orthonormality conditions must be imposed at each time-step: 

(<j)i(t + At)\S{t + At)\<pj(t + At))= 5ij , (27) 

leading to the following matrix equation: 

A + XB + S t A t + ACA 1 " = 1 (28) 

where the unknown matrix A is related to the matrix of Lagrange multipliers A at time t + At via A = 
(At) 2 A*(t + At). In Eq. d28l the dagger indicates Hermitian conjugate (A = A^). The matrices A, B, and 
C are given by: 

Aij = (^\S(t + At)\^) , 

Bij = (Q-T-SWfaWlSit + Mjfo) , 

dj = (Q- l S(t)Mt)\S(t + At)\Q- l S(t)^(t)) , (29) 

with 

h = 2Ut) ~ Mt - At) - (Atfe- 1 ^!^ . (30) 

The solution of Eq. d28b in the ultrasoft PP case is not obvious, because Eq. d26t is not a closed expression 
for R/(t + At). The problem is that A(t + At) appearing in Eq. (I26t depends implicitly on R/(t + At) 
through S(t + At). Consequently, it is in principle necessary to solve iteratively for R/(t + At) in Eq. d26l . 
A simple solution to this problem is given in Ref. 10 . A(t + At) is extrapolated using two previous values: 

A^ 0) (t + At) = 2A ij (t) - Aij(t - At). (31) 

Eq. d26l is used to find (t + At), which is correct to 0(At 4 ). From Rj (t + At) we can obtain a new 
set Ay (t + At) and repeat the procedure until convergence is achieved. It turns out that in most practical 
applications the procedure converges at the very first iteration. Thus, the operations described above are 
generally executed only once per time step. 

The solution of Eq. d28t is found using a modified version 610 of the iterative procedure of Ref A The 
matrix B is decomposed into hermitian (B^) and antihermitian (B a ) parts, 

B = B h + B a , (32) 

and the solution is obtained by iteration: 

B h + B h \( n+1 "> = l-A- X^B a - B\\^ - AWCAH (33) 
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The initial guess A (o) can be obtained from 

\<®B h + B h \^ = l-A. (34) 

Here the B a - and C-dependent terms are neglected because they are of higher order in At (B a vanishes for 
vanishing At). Eqs. d34l and (1331 have the same structure: 

XB h + B h X = X (35) 

where X a Hermitian matrix. Eq. d33l can be solved exactly by finding the unitary matrix U that diagonal- 
izes Bh\ XJ^ByJJ = D, where Dij = didij. The solution is obtained from 

(U^XU)ij = {tfXU^/idi + dj). (36) 

When X = 1 — A Eq.d36l yields the starting \(°\ while A^ n+1 ^ is obtained from A^ n ^ by solving Eq. d36t 
with X given by Eq. d3*3l . This iterative procedure usually converges in ten steps or less. 



D. Ultrasoft pseudopotential implementation in the serial case 

1. Plane-wave expansion 

Let {R} be the translation vectors of the periodically repeated supercell. The corresponding reciprocal 
lattice vectors {G} obey the conditions Rj • Gj = 2irn, with n an integer number. 
The KS orbitals can be expanded in a PW basis up to a kinetic energy cutoff £ c wf : 



where £1 is the volume of the cell, {G™ f } is the set of G vectors satisfying the condition 



^ k (r) = -= £ <MG)e" ' " (37) 



n 2 

2m 



k + G\ z <E™, (38) 



and k is the Bloch vector of the electronic states. In crystals, one must use a grid of k-points dense enough 
to sample the Brillouin zone (the unit cell of the reciprocal lattice). In molecules, liquids and in general if 
the simulation cell is large enough, the Brillouin zone can be sampled using only the k = (T) point. An 
advantage of this choice is that the orbitals can be taken to be real in r-space. In the following we will drop 
the k vector index. Functions in real space and their Fourier transforms will be denoted by the symbols, if 
this does not originate ambiguity. 
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The </>j(G)'s are the electronic variables. The calculation of H<fij and of the forces acting on the ions are 
the basic ingredients of the computation. Scalar products (<t>j\(3n) an d their spatial derivatives are typically 
evaluated in G-space. An important advantage of working in G-space is that atom-centered functions like 
(3 l n and Q,\ im are easily evaluated at any atomic position, for example: 

/#(G) = p n (G)e- tG - R ' . (39) 

Thus: 

Wn>= E $(G)A,(G)e- iG - R ' , (40) 
Ge{G c w f } 

and 

tti\^) = -i E G#(G)A,(G)e- iG - R ' . (41) 

The kinetic energy term is diagonal in G-space and is easily calculated: 

-(V 2 ^)(G) = G%-(G). (42) 
In summary, the kinetic and nonlocal PP terms in H(j)j are calculated in G-space. 

2. Dual space technique 

The local potential term V c g(j)j could be calculated in G-space, but it is more convenient to use a different 
("dual space") technique. The idea is to switch from G- to r-space, back and forth, using FFT, and to perform 
the calculation in the space where it is more convenient. The KS orbitals are first Fourier-transformed to 
r-space; then, (V^ff^)(r) = V e g(r)(j)j(r) is calculated in r-space, where V e g is diagonal; finally (V e g(j)j)(r) 
is Fourier-transformed back to (V e g(/)j)(G). 

In order to use FFT, one discretizes the r-space by a uniform grid spanning the unit cell: 

f(m 1 ,m 2 ,m 3 ) = f(r mmm3 ), = mi^- + m 2 ^+m 3 ^, (43) 

ivi iV2 iV3 

where ai, a.2, &3 are lattice basis vectors, the integer index mi runs from to Ni — 1, and similarly for 771,2 
and 7773. In the following we will assume for simplicity that N\,N2, N3 are even numbers. The FFT maps a 
discrete periodic function in real space /(mi, 7772, 771,3) into a discrete periodic function in reciprocal space 
/(ni, 7i2, 773) (where 771 runs from to Ni — 1, and similarly for 772 and 773), and vice versa. 
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The link between G-space components and FFT indices is: 

f(n 1 ,n 2 ,n 3 ) = /(G^^yJ, G^^ = n[bi + n' 2 b 2 + n' 3 b 3 (44) 

where m = n[ if n[ > 0, m = n' x + N\ if n' x < 0, and similarly for n 2 and n^. The FFT dimensions 
Ni, N2, N3 must be big enough to include all non negligible Fourier components of the function to be 
transformed: ideally the Fourier component corresponding to n[ = N\/2, and similar for n' 2 and n' 3 , should 
vanish. In the following, we will refer to the set of indices n\ , n 2 > n 3 and to the corresponding Fourier 
components as the "FFT grid". 

The soft part of the charge density: n so f t (r) = J2j l<^j( r )| 2 > contains Fourier components up to akinetic 
energy cutoff E* o{t = 4£' c wf . This is evident from the formula: 

n so ft(G)= J2 £#( G - G %-( G ')- ( 45 ) 
G'e{G« f } j 

In the case of NC PPs, the entire charge density is given by n so f t (r). 

V e g should be expanded up to the same £ c soft cutoff since all the Fourier components of V c s<j)j up to 
E™ { are required. Let us call {G<? oft } the set of G- vectors such that 

—G 2 < E c so{t . (46) 
2m 

The soft part of the charge density is conveniently calculated in r-space, by Fourier-transforming 4>j(G) 
into 4>j(r) and summing over the occupied states. 

The exchange-correlation potential /i xc (r), Eq. (fl4b . is a function of the local charge density and - for 
gradient-corrected functionals - of its gradient at point r: 

Mxc(r) =Kc(n(r),|Vn(r)|) . (47) 

The gradient Vrt(r) is conveniently calculated from the charge density in G-space, using (Vn)(G) = 
— iGn(G). The Hartree potential Vn(r), Eq. (fT3T >. is also conveniently calculated in G-space: 

Thus in the NC-PP case, a single FFT grid, large enough to accommodate the {G c soft } set, can be used for 
orbitals, charge density, and potential. 

The use of FFT is mathematically equivalent to a pure G-space description (we neglect here a small 
inconsistency in exchange-correlation potential and energy density, due to the presence of a small amount 
of components beyond the {G^ oft } set). This has important consequences: working in G-space means 
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that translational invariance is exactly conserved and that forces are analytical derivatives of the energy 
(apart from the effect of the small inconsistency mentioned above). Forces that are analytical derivatives 
of the energy ensure that the constant of motion (i.e., the sum of kinetic and potential energy of the ions in 
Newtonian dynamics) is conserved during the evolution. 

3. Double-grid technique 
Let us focus on US PPs. In G-space the charge density is: 

n(G)=n soft (G)+ £ QL(G)(&I#></&I&> ■ ( 49 ) 

i,nm,I 

If £ c wf is the cutoff for the KS orbitals, the cutoff for the soft part of the charge density is £' c soft = 4E c wf . The 
augmentation term often requires a cutoff higher than i? c soft , and as a consequence a larger set of G- vectors. 
Let us call {G<? cns } the set of G- vectors that are needed for the augmented part: 

h 2 

—G 2 < E c dens . (50) 
2m 

In typical situations, using pseudized augmented charges, E dens ranges from E* oit to ~ 2 — 3£ , c soft . 

The same FFT grid could be used both for the augmented charge density and for KS orbitals. This 
however would imply using an oversized FFT grid in the most expensive part of the calculation, dramatically 
increasing computer time. A better solution is to introduce two FFT grids: 

• a coarser grid (in r-space) for the KS orbitals and the soft part of the charge density. The FFT 
dimensions N\, N2,N% of this grid are big enough to accommodate all G- vectors in {Gc° ft } 

• a denser grid (in r-space) for the total charge density and the exchange-correlation and Hartree po- 
tentials. The FFT dimensions Mi > N\,M2 > iV2, M3 > N3 of this grid are big enough to 
accommodate all G- vectors in {G<f cns }. 

In this framework, the soft part of the electron density n so f t , is calculated in r-space using FFTs on the 
coarse grid and transformed in G-space using a coarse-grid FFT on the {G c soft } grid. The augmented charge 
density is calculated in G-space on the {G<? cns } grid, using Eq. d49b as described in the next section. n(G) 
is used to evaluate the Hartree potential, Eq. d48b . Then n(G) is Fourier-transformed in r-space on the dense 
grid, where the exchange-correlation potential, Eq.dTTt. is evaluated. 

In real space, the two grids are not necessarily commensurate. Whenever the need arises to go from 
the coarse to the dense grid, or vice versa, this is done in G-space. For instance, the potential V e g, Eq. 
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( fTSl . is needed both on the dense grid to calculate quantities such as the D^ m , Eq. (fTol l. and on the coarse 
grid to calculate V e g(j)j, Eq. (ITT1 . The connection between the two grids occurs in G-space, where Fourier 
filtering is performed: V c g is first transformed in G-space on the dense grid, then transferred to the coarse 
G-space grid by eliminating components incompatible with E^ oft , and then back-transformed in r-space 
using a coarse-grid FFT. 

We remark that for each time-step only a few dense-grid FFT are performed, while the number of neces- 
sary coarse-grid FFTs is much larger, proportional to the number of KS states N^ s . 

4. Augmentation boxes 

Let us consider the augmentation functions Q nm , which appear in the calculation of the electron density, 
Eq. (Effil i. in the calculation of D^ m , Eq. (IToT i. and in the integrals involving dQ^ m / dUi needed to compute 
the ionic forces, Eq. d23"t . The calculation of the Q nm in G-space has a large computational cost because 
the cutoff for the Q nm is the large cutoff 7? c dens . The computational cost can be significantly reduced if we 
take advantage of the localization of the Q nm in the core region. 

We call "augmentation box" a fraction of the supercell, containing a small portion of the dense grid in 
real space. An augmentation box is defined only for atoms described by US PPs. The augmentation box for 
atom 7 is centered at the point of the dense grid that is closer to the position R/. During a MD run, the center 
of the 7— th augmentation box makes discontinuous jumps to one of the neighboring grid points whenever 
the position vector R/ gets closer to such grid point. In a MD run, the augmentation box must always 
contain completely the augmented charge belonging to the 7— th atom; otherwise, the augmentation box 
must be as small as possible. Augmentation boxes of different sizes for different atoms could in principle 
be used, but in our implementation the same box size is chosen for all the atoms. Thus the atomic species 
having the less localized augmented charge determines the size of all the augmentation boxes. 

The volume of the augmentation box is much smaller than the volume of the supercell. The number of 
G-vectors in the reciprocal space of the augmentation box is smaller than the number of G-vectors in the 
dense grid by the ratio of the volumes of the augmentation box and of the supercell. As a consequence, the 
cost of calculations on the augmentation boxes increases linearly with the number of atoms described by US 
PPs. 

Augmentation boxes are used twice in the calculation: 

(i) to construct the augmented charge density, Eq. ©, 

(ii) to calculate the self-consistent contribution to the coefficients of the nonlocal PP, Eq. (fTol l. 
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In case (i), the augmented charge is conveniently calculated in G-space, following Ref. , and Fourier- 
transformed in r-space. All these calculations are done on the augmentation box grid. Then the calculated 
contribution at each r-point of the augmentation box grid is added to the charge density at the same point in 
the dense grid. In case (ii), it is convenient to calculate D^ m as follows: for every US atom, take the Fourier 
transform of V e s(r) on the corresponding augmentation box grid and evaluate the integral of Eq. (fTol l in 
G-space. 

III. PARALLEL ULTRASOFT PSEUDOPOTENTIAL IMPLEMENTATION 

Various parallelization strategies for PW-PP calculations have been described in the literature. A strategy 
that ensures excellent scalability in terms of both computer time and memory consists in distributing the 
PW basis set and the FFT grid points in real and reciprocal space across processors. A crucial issue for the 
success of this approach is the FFT algorithm, which must be capable of performing three-dimensional FFT 
on data shared across different processors with good load balancing 4 . This algorithm can be generalized to 
the US case as described in the following subsection. 

A. Parallel FFT in the US case 

Partitioning a real-space FFT grid across processors is straightforward. The FFT grid, Eq. d43l . is sub- 
divided in a number of slices equal to the number of processors, so that each processor can take care of a 
different slice. The slices are cut along planes orthogonal to a chosen crystallographic direction. We label 
the crystallographic directions by 1,2,3. For instance, let us consider a FFT grid with N3 planes along direc- 
tion 3, which is distributed across N p processors. If N p is a divisor of .ZV3, good load balancing is achieved 
if each slice contains the same number (N^/Np) of planes. Processor p contains planes with 711,3 values such 
that: (p — l)(Ns/N p ) < 1113 < p(N^/N p ) — 1. If N p is not a divisor of N3, all the slices cannot be equal. In 
this case their dimension is chosen in such a way as to minimize load imbalance. If N p exceeds the number 
of planes N3, this strategy has to be refined. 

The partition of the G-space grid is more involved. The Fourier components of the quantities of interest 
(e.g. the orbitals, the charge density, etc.) are stored as vectors (one-dimensional arrays): f(i) = f(G{), 
where the index i spans one of the three sets of G- vectors defined above, namely the set {G™ f }, the set 
I q soft 1 ^ an( j ^ set |G^ cns }. When a FFT is needed, the Fourier components have to be transferred to one 
of the two grids (three-dimensional arrays), defined by Eq. d44l . The two grids are either the coarse grid, 
with dimensions Ni, N2, N3, or the dense grid, with dimensions M±, M2, M3. The Fourier components 
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must be evenly distributed across processors in order to achieve optimal load balancing for operations like 
scalar products. At the same time, their distribution across processors should achieve good load balancing 
in the FFTs and minimize the amount of data communication needed to perform the FFTs. 

For each pair n' l5 n' 2 in Eq. d44l we define a "column" in G-space, including all G n ^ n ^ n ^ with — M3/2 < 
n 3 < M3/2. Since the KS orbitals have nonzero Fourier component only for G- vectors belonging to the set 
{G ™ f }, only a subset of all the columns contributes to a one-dimensional FFT of a KS orbital in the direction 
3. We call these columns "active columns" for the set {G™ f }. In general, the number of nonzero Fourier 
components is different for each active column. Ideally, we would like to distribute the active columns across 
the processors, so that each processor receives the same number of active columns and the same number of 
Fourier components. Although not possible in general, this can be achieved to a good extent with a simple 
algorithm 1 —: 1) create a list of columns, ordered by decreasing number of nonzero Fourier components; 
2) assign the column to the processors, following the order in the list; 3) when all the processors contain 
at least one column, assign the following column in the list to the processor with the smallest number of 
nonzero Fourier components. This algorithm works nicely when the number of columns per processor is 
large enough. 

After assigning to the processors all the columns that are active for the set {G™ f }, we distribute across 
the processors the remaining columns, that are active for the set {G c soft }, using the same algorithm. Finally 
we distribute across the processors the remaining columns, that are active for the set {G c dcns }, again using 
the same algorithm. The remaining columns are not active for any set of G- vectors and play no role. 

After distributing all the columns across the processors, a one-dimensional FFT along direction 3 is 
done on local data (on a single processor). However, the data on the planes orthogonal to direction 3 are 
distributed across the processors. In order to perform FFTs in each of these planes, the corresponding data 
must be made local to a processor. This is achieved by a parallel transpose operation, performed with a 
single call to the appropriate MPI 15 library routine. Two-dimensional FFTs can then be performed on the 
planes, with each processor operating on local data. Nonzero contributions are present only for (ni,n' 2 ) 
pairs corresponding to active columns. This fact can be exploited to reduce the number of FFT operations, 
by performing only the FFTs along direction 1 (or 2) that include nonzero contributions. The strategy for 
parallel three-dimensional FFT that we have presented requires the number of processors N p be smaller 
than or equal to the number of planes N%. The FFT from r- to G-space uses the same algorithm in reversed 
sequence. 

In calculations using only the T point (k = 0), the KS orbitals can be chosen to be real functions in 
r-space, so that 0(G) = 4>*(— G). This allows us to store only half of the Fourier components. Moreover, 
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two real FFTs can be performed as a single complex FFT. To this end the auxiliary function is introduced: 

*(r) = i (r)+»0 i+ i(r) (51) 

whose Fourier transform <&(G) yields 

$(G) + $*(G) 

^(G) = (52) 

<3>(G) - S>*(G) 

<A?+i(G) = • (53) 

A side effect on parallelization is that G and — G must reside on the same processor. As a consequence, 
pairs of columns with G n / )7l ^ jn / and G- n ' - n ' 2 ,n' 3 (with the exception of the case n[ = n' 2 = 0), must be 
assigned to the same processor. 



B. Scalar products 

All scalar products (f\g) = J2i fiQii i = 1, w where i runs on a distributed grid, can be calculated by 
calling standard optimized library routines (like BLAS from NetLib 17 ) on each processor, and subsequently 
by summing the partial results of all processors, using a call to standard MPI 15 libraries. Scalar products 
between vectors for which only half of the Fourier components are stored require a special treatment. Let n p 
be the number of Fourier components stored on processor p. The contribution of this processor to the scalar 
product is (f\g) p = 2J2i=i,n p fi9i if tne G = components are not within the set of n p components. If 
instead the G = components, identified by i = 1, are stored on processor p, the contribution of processor 
p to the scalar product is (f\g) p = f x g 1 + 2 J2i=2,n p 



C. Iterative Orthonormalization 



The scalar products in the matrix elements, Eq. ( l2^t . needed to compute the Lagrange multipliers are 
calculated in parallel, following the procedure of the previous subsection. In the present implementation, the 
solution of the matrix equation d35b . involving square matrices of dimension equal to the number N^ s of KS 
orbitals, is not parallelized but replicated on all the processors. Usually the time spent in the non parallelized 
part of the iterative orthonormalization is only a small fraction of the total time of the calculation. 

To efficiently perform calculations on very large systems, using a large number of processors, the solution 
of Eq. d33t should also be parallelized. The time consuming steps are matrix-matrix multiplication and the 
diagonalization of the B matrix. Both calculations require 0(N^ S ) floating-point operations. A convenient 
parallelization approach is described in Ref. 6 . 
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D. Augmentation boxes 

The parailelization of the calculations performed on the augmentation boxes is not obvious for two 
reasons: 1) each augmentation box has a grid which is a portion of the dense grid and is distributed across 
processors; 2) the boxes follow the atoms in the MD evolution, causing the portion of the dense grid to 
change with time. In the present implementation, we deal with these difficulties as follows. We keep on 
all processors a copy of all the quantities defined on the augmentation boxes. Calculations on the grid of 
a given augmentation-box are performed only in the processors that contain at least a fraction of the given 
augmentation-box grid. This causes some replication of the calculations. FFTs on the augmentation box 
grid are performed locally on each processor. In order to reduce the amount of replication, in the FFTs from 
G- to r-space, the two-dimensional FFTs along planes orthogonal to direction 3 are performed only in the 
planes belonging to the slice of the dense grid that is local to a given processor. No communication is needed 
to copy the augmented charge in r-space from the augmentation-box grid to the dense grid (see Sec. Ill D 4t . 
In the calculation of D^ m , we evaluate Qn m in G-space, transform it in r-space using augmented-box FFT, 
evaluate the integral of Eq. (fTol l in r-space,and sum the final result over all processors. This approach keeps 
communications to a minimum, at the expense of a number of augmentation-box FFTs larger than in the 
serial case. 

Augmentation-box grid related calculations constitute a very small part of the overall computational cost, 
both in computer time and in memory. Therefore the simple approach that we have presented is convenient 
even if some calculations are replicated on few processors and the load balance is not optimal. 

IV. TEST CASE: IRON PORPHYRINS 

We report here a comparison of computer performances for US and standard PPs in CP calculations. 
Our test systems-prototypes of systems containing the iron-porphyrin motif-are a reduced and an extended 
model of the active site of myoglobin. 

A. Models and computational details 

The reduced model is composed of an iron-porphyrin-imidazole complex, already investigated using the 
CP method by Rovira et al. 18 ; the metallic pentacoordinated center is bound to the four planar porphyrin 
nitrogens, with the imidazole nitrogen occupying one of the axial sites, binding approximately orthogonal 
to the porphyrin plane, see Figure 1. The chemical formula is [FeNgC23Hi6]. A simple cubic cell of size 
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Figure 1. Reduced model: the iron-porphyrin-imidazole complex. Yellow: Fe. Dark gray: C. Blue: N. Light gray: H. 

15.875 A, containing a total of 46 atoms and 154 electrons, is used. For the reduced model, we compare 
both spin-restricted and unrestricted (S=2) calculations. 

The extended model is composed by a large portion of the myoglobin active site, defined by the full heme 
group (same coordination as for the reduced model) plus the 13 surrounding residues which were comprised 
within a sphere of 8 A radius centered on the iron atom, see Figure 2. The initial geometry has been taken 
from the X-ray experimental structure of the 02-myoglobin complex 19 and the included residues have been 
terminated by NH2 groups, resulting in a total of 332 atoms and 902 electrons. The chemical formula is 
[FeOigN35Cio6Hi73]. A simple cubic cell of size 25.4 A has been used, ensuring a minimum separation 
of 5 A between periodic replicas. For the extended model we discuss only the performances of the more 
computationally demanding spin-unrestricted (S=2) calculations. 

For a correct comparison of performances we need to compare data of similar quality, in terms of ac- 
curacy of chemical properties, obtained with algorithms of comparable quality in terms of serial speed and 
parallel speedup. In order to satisfy the first requirement, we need to determine a set of cutoffs for the US 
and standard calculations yielding comparable structural properties. To this end we compared the optimized 
geometry of the triplet ground state for the reduced model from our US calculations and from published NC 
results, performed at E™ f = 70 Ry 18 . Our geometrical parameters are converged to the same extent as those 
in Ref. 18 at E™ { = 25 Ry and E^ ens = 200 Ry; in particular the critical Fe-N distances in the porphyrin 
and in the imidazole are found to be 2.00 and 2.12 A respectively, vs. 2.00 and 2.14 A of RefJ£, with the 
out-of-plane displacement of the iron atom computed to be 0.14 vs. 0.15 A. The overall agreement between 
the two sets of results is excellent, and the residual difference can be attributed to the different functionals 




Figure 2. Extended model of the myoglobin active site. Red: O, other colors as in Figure 1. 
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used: BP86 20 in Ref. 18 , and PW91 21 in our calculations. We estimate that we can safely compare US-PP 
calculations performed at E™ { = 25 Ry to standard calculations at E™ f = 70 Ry. For a fair comparison 
we use the same cutoff for the charge density in the US case as in the standard case 

( £dens = 28Q Ry) 

We also performed calculations at E™ f = 35 Ry for the US case, at E™ = 100 Ry for the standard case 
(£dens = 400 Ry in both cases) 

In order to compare algorithms of similar quality, all calculations were performed using the same code 22 
(standard PPs are just a special case of US PPs). The PPs used in the standard case tests were generated 
using the technique of Troullier and Martins 23 . In the US case, we use US PPs for all atoms, including H. 
The PW91 functional 21 is used in all calculations. 

B. Results 

The results for the reduced models were obtained on a 32-node IBM SP3 (4 x 375MHz power3 processors 
per node), while the extended model calculations were performed on a 64-processor SGI Origin (64 x 
300MHz RISC 12000 processors), both at the Keck Materials Science Laboratory, Princeton University. 

The reported execution times are an average over 20 time steps of the measured wall time (the sum of 
CPU and system time, differing by only a few percent from pure CPU time). 

TABLE I: Performances of the US and NC calculations, for the spin-restricted case. For US PPs: E™ 1 = 25 Ry, 
£dens _ Ry, FFT grid size: 160,96,16 for the dense, coarse, and augmentation box grids, respectively. For NC 
PPs: £ c wf = 70 Ry, £ c dons = 280 Ry, FFT grid size: 160. N p is the number of processors; M r is an estimate of the 
RAM needed per processor in Mb; T e is the execution time per electronic time step (at fixed atoms), in s; Tj is the 
same as T e per CP time step (atoms moving), in s. 





US 


NC 


N p 


Mr T e Ti 


M r 




Ti 


4 


190 49.5 56.2 


357 


136.1 


159.7 


8 


100 23.6 26.1 


139 


44.6 


49.8 


16 


57 10.7 12.1 


77 


21.0 


22.9 



The parallelization performances of the US vs. NC-PPs implementation for the spin-restricted case of 
the reduced model are shown in Table 1. The execution times for a CP step (including calculation of forces 
and time evolution of atomic positions) are about 15 % larger than for a purely electronic step (orbital 
time evolution only), as expected. The small superlinear speedup observed both for US and NC PPs is 
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a consequence of caching: since the memory per processor decreases almost linearly with the number of 
processors, better caching can be achieved with an increased number of processors, thus increasing the serial 
speed of the code. It is worth noting that US calculations are faster by a factor ~ 2.5 with respect to the NC 
case and require half RAM memory and 1/4 disk space with respect to standard calculations. 

The performances of US vs. NC-PPs calculations at higher cutoff are shown in Table 2. The number of 
PWs is approximately (35/25) 3//2 ~ 1.65 times larger than in the preceding case. Execution times should 
theoretically be proportional to the same factor. The factor is actually somewhat larger (~ 1.9), but the cache 
effects mentioned above and the effect of the discreteness of the FFT grid explain the difference. Again, US 
calculations are faster by a factor ~ 2.5 with respect to the NC case and require half RAM memory with 
respect to standard calculations. 

In Table 3 we report spin-unrestricted results, showing an approximate doubling of execution time and 
of memory requirements, in line with expectations. 

TABLE II: Performances of the US calculations, spin-restricted case at higher cutoff. For US PPs: E™ { = 35 Ry, 
fldens _ ^QQ £y pFT grid size: 192,120,20 for the dense, coarse, and augmentation box grids, respectively. For NC 
PPs: £ c wf = 100 Ry, £ c dons = 400 Ry, FFT grid size: 192. The meaning of the various columns is the same as in 
Table 1. 





US 


NC 


N p 


M r T e T, 


M r 




8 


167 44.7 50.1 


307 


118.7 143.7 


16 


93 20.9 23.0 


157 


50.5 57.2 



TABLE III: Performances of the US calculations: spin-unrestricted calculations, same cutoffs as in Table 1. 



N p 


M r T e 


T 


4 


263 92.5 


104.5 


8 


139 44.6 


49.8 


16 


77 21.0 


22.9 



Table 4 contains the performances of US calculations on the quintet state (S = 2), which is experimen- 
tally known to be the ground spin state in myoglobin, of our extended model of myoglobin at E™ { = 25 Ry 
and E^ cns = 200 Ry. The scaling with the number of processors is excellent in this case too, with a slight 
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TABLE IV: Performances of the US calculations: spin-unrestricted calculations for the extended model, = 25 
Ry, £ c dons = 200 Ry, FFT grid size: 224, 160, 20 for the dense, coarse, and augmentation box grids, respectively. 



N p 


M r T e 


T 


16 


1067 3011 


3764 


32 


656 1407 


1690 


48 


441 992 


1190 



superlinear scaling up to 32 processors. The ratio between execution times for electronic and CP time-steps 
is almost the same in the extended and in the reduced models. A geometry optimization time step requires 
less than 20 minutes in the spin-unrestricted case on 48 processors, with a total RAM usage of ~ 21 Gb. The 
total size of the file containing the KS orbitals is 6.7 Gb. In this case no standard calculation was attempted; 
indeed, an extrapolation from the results of Tables 1, 3 and 4 points to a total memory requirement of more 
than 40 Gb. 

A typical local geometry optimization requires ~ 250 time-steps. An execution time of 20 minutes per 
time step thus translates into less than 4 days for the optimization to complete. This is a perfectly feasible 
calculation, not even requiring a state-of-the art massive parallel computer. On the other hand, a typical 
MD run requires no less than 10000 time-steps, corresponding in the present case to a few ps of simulation 
time. A true dynamical simulation would therefore become accessible on a state-of-the art massive parallel 
computer. 

The relevance of simulating extended portions of myoglobin active site can be understood by considering 
the spin density distribution of the quintet spin state of the extended model, reported in Figure 3 together 
with selected integrated spin density values. The spin density is mainly localized on the iron atom (ca. 
88 %), even though a sizable contribution (ca. 12 %) is computed to be delocalized over the rest of the 
system, with largest contributions arising from the propionate groups bound to the porphyrin ring. This 
finding is of particular interest, considering that CO rebinding in myoglobin has been recently related to a 
spin crossover from the quintet spin state, corresponding to unbound CO + myoglobin, to the singlet spin 
state characterizing the bound configuration of the CO-myoglobin complex. 24 Since explicit inclusion of the 
protein environment alters the spin distribution of the quintet state in our extended model, an effect on the 
relative energy of the different spin states can be expected. 
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Figure 3. Isodensity contour plot (contour value 0.02) and selected integrated values of the spin density for the 
quintet state of myoglobin extended model. Hydrogen atoms have been omitted for clarity. 

V. ACCURACY OF PERIODIC BOUNDARY CONDITIONS 



The use of PBC to describe molecules is perfectly appropriate for neutral molecules with small 
dipole/quadrupole moments, provided that the chosen supercell is large enough to minimize the spurious 
interactions between periodic replicas. This goal can usually be reached with supercells that leave a few A 
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of empty space between periodic replicas. 

Charged molecules should be described by charged supercells, but these have infinite electrostatic energy. 
A finite energy is obtained by setting to zero the divergent G = contributions to the energy, as if the system 
were neutral. This is equivalent to adding a neutralizing background. Energies obtained in this way will be 
referred to as "uncorrected". The direct comparison of uncorrected energies between different charge states 
is usually meaningless, because the error induced by PBC is large in this case. The long-range character of 
Coulomb interactions would require unpractically large supercells. Uncorrected energies may be affected 
by a large error also in molecules with large dipole/quadrupole moments. 

Several techniques have been devised to overcome such limitation. The Hockney technique 27 yields 
an exact treatment of charged species using PWs without imposing PBC. This is achieved by cutting the 
Coulomb potential in real space beyond a suitably chosen cutoff that excludes all spurious interactions be- 
tween periodic replica, still taking into account intramolecular interactions. This technique is rather expen- 
sive, since it requires the definition of an enlarged FFT grid for the Coulomb potential. A similar technique 28 
where the cutoff acts in reciprocal space allows for faster execution with minor loss of accuracy. 

A much simpler and approximate technique, due to Makov and Payne (MP) 11 , consists in calculating 
the leading electrostatic correction terms and removing them from the uncorrected total energy. The MP 
correction is performed a posteriori on the energy only: the effect of the charge on the potential and on 
atomic forces is therefore neglected. 

The Makov-Payne corrected energy Emp for a cubic supercell has the form: 

where E is the uncorrected energy, q is the net charge, Q is the quadrupole moment, a is the Madelung 
constant, L is the supercell side. This is Eq.(15) of Ref. 11 with the correct sign. When applying Eq. <54b . 
the origin has to be translated so that the dipole moment vanishes 11 . Therefore, the calculation of the 
Makov-Payne correction is straightforward since it requires only calculation of the dipole and quadrupole 
moments. 



A. Models and computational details 



We want to verify the ability of PW-PP calculations with the MP correction to reproduce the electronic 
and structural properties of highly charged species. We compute the energy difference between two meso- 
substituted Mn(V) porphyrins: the oxo-aquo-Mn(V)TM-2-Pyridyl (2-Pyp) and -Mn(V)TM-4-Pyridyl (4- 
Pyp) porphyrins, I (see Figure 4), and between the corresponding oxo-hydroxo species, II, in which the 
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Figure 4. 2-Pyp (upper panel) and 4-Pyp (lower panel) isomers of the oxo-aquo (left panel) and oxo-hydroxo (right 
panel) Mn(V) porphyrins. Mn is the light blue atom, the green atom signals the C of the methyl group bound to a N 
in the aromatic ring, whose position differs in the two isomers. Other colors are as in Figure 2. 

axial water molecule has been replaced by a OH - ligand. We compare our results to calculations employ- 
ing a localized basis set of Slater type orbitals (STO). The oxo-aquo and oxo-hydroxo porphyrins, recently 
experimentally characterized as mimics of the halide oxidation reaction catalyzed by haloperoxidases 25 26 , 
have a charge +5 and +4, respectively, with 4 positive charges approximately localized on the aryl ni- 
trogens and, in the case of the oxo-aquo species, the residual positive charge located at the metal center; 
the two isomeric porphyrins differ, both in the oxo-aquo and oxo-hydroxo form, for the position of the 
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methyl-substituted nitrogen in the aromatic ring attached to the meso porphyrin carbons, which should lead 
to a considerable increase of the quadrupole moment from the 2-Pyp to the 4-Pyp isomer. Due to charge 
differences between oxo-aquo and oxo-hydroxo species, to the high total charge and to the large expected 
difference in the quadrupole moment between the 2-Pyp and 4-Pyp porphyrins, we believe that the calcu- 
lation of the relative energies of the two isomeric porphyrins represents a severe test for PW calculations 
within PBC. 

We consider a reduced model of the real systems, in which the methyl groups bound the the aryl nitrogens 
are replaced by hydrogens. The US-PP results are compared to those obtained by using STOs in the frozen 
core approximation. US-PP calculations were performed by using a cutoff of 25-200 Ry for the KS orbitals 
and density, respectively, and a cubic cell of side 19.05 A, without any symmetry constraints. 

STOs results were obtained using the ADF progra m 29 i 3Q ; the frozen cores include \s-2p states for Mn, 
Is states for O, N and C. The KS orbitals were expanded in an uncontracted DZ STO, standard basis set 11^- 
for all atoms with the exception of the transition metal for which we used a standard basis set IV^i, of TZP 
quality. STOs calculations were performed within C2 V and Cs symmetry constraints for species I and II, 
respectively. 

B. Results 

In Table 5 we compare the relative energy of the singlet ground states 25 of the two isomeric 2-Pyp and 
4-Pyp porphyrins, for both the oxo-aquo and oxo-hydroxo species. For the PP calculations we report both 
uncorrected and Makov-Payne corrected energy differences. Table 6 contains the calculated values of the 
quadrupole moment used in Eq. d54b . 

Results obtained with STOs localized basis sets compute the oxo-aquo 4-Pyp system to be ca. 26 
kcal/mol more stable than the 2-Pyp one. The Makov-Payne corrected US-PP results are in excellent agree- 
ment with STOs results, while uncollected PP results indicate instead that the 4-Pyp isomer is more stable 
than the 2-Pyp one by only 9.5 kcal/mol. Moreover, in the case of the oxo-hydroxo species, the uncorrected 
results yield an incorrect energy ordering, with the 2-Pyp isomer computed to be more stable than the 4-Pyp 
one by 7.0 kcal/mol. The discrepancy is resolved upon correcting the total energies with the Makov-Payne 
term, resulting again in an excellent agreement with the STOs energy differences. 

Interestingly, the geometrical structures of the investigated charged systems calculated using the US-PP 
approach with PBC, turn out to quantitatively compare with results obtained using localized basis sets, see 
Table 7 for a comparison of main optimized geometrical parameters of species I. The main discrepancy 
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TABLE V: Energy differences (kcal/mol) between the two isomeric porphyrins, for the oxo-aquo (first row) 

and oxo-hydroxo (second row) species, computed with localized STOs basis and with US PPs, with Makov-Payne 
(MP) correction and uncorrected. 









US PPs 






STOs 


MP no-MP 


I 


A£ 2 -4 


26.4 


24.5 9.5 


II 


AS2-4 


11.5 


11.1 -7.0 



TABLE VI: Quadrupole moments (a.u.) calculated with US PPs for the 2-Pyp (Q2) and 4-Pyp (Q4) isomers of the 
oxo-aquo (first row) and oxo-hydroxo (second row) species. 





Q 2 


Qi 


I 


518.1 


627.2 


II 


412.8 


522.8 



(0.03 A) is computed for the formally triple Mn=0 bond, probably because of the lack of polarization 
functions in the O STO basis set which in turn leads to an overestimate of such parameter. The agreement 
between US PPs and STOs results suggests that the error introduced on the electrostatic potential by the 
presence of a charge in PBC does not significantly affect the structural properties. On the other hand, the 
effect on the total energy is sizable but mostly corrected by the use of Eq. d54l . 

C. Conclusions 

We believe that our results demonstrate that the Car-Parrinello approach in conjunction with ultrasoft 
pseudopotentials represents a valuable and relatively cheap tool to describe the electronic and geometrical 
properties of complex bio-inorganic systems, including highly charged and open-shell species. The study of 
the electronic and geometrical properties of such systems can now be achieved at a reasonable computational 
cost on conventional parallel machines with a limited number of processors. Moreover, a simple correction 
allows us to calculate energy differences in charged systems with an accuracy that is comparable to that of 
localized basis-set calculations not using Periodic Boundary Conditions. First-principle molecular dynamics 
simulations of extended bio-inorganic systems may still be feasible only on state-of-the art massive parallel 
computer. The key to reach such a goal is the availability of parallel machines with increased performances 
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TABLE VII: Main optimized geometrical parameters (A, degree) for the 2-Pyp and 4-Pyp oxo-aquo isomers (I), 
computed with localized STOs basis and with US PPs. 





2-Pyp 


4-Pyp 




US PPs 


STOs 


US PPs 


STOs 


tMriEO 


1.54 


1.57 


1.54 


1.57 


lMn-OH 2 


2.21 


2.20 


2.22 


2.21 


TMn— N av. 


2.04 


2.02 


2.04 


2.02 


L N-Mn-N par 


163.3 


163.5 


163.4 


163.8 


Z N-Mn-N per 


167.8 


166.6 


166.9 


165.4 



and number of processors and highly optimized scalable algorithms. We believe that the present parallel 
implementation of the Car-Parrinello method using ultrasoft pseudopotentials provides such an algorithm 
that will allow in the near future the simulation of the dynamical properties of such complex systems. 
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